-
Notifications
You must be signed in to change notification settings - Fork 1
refactor: modular pipeline layout (sources → transforms → outputs → core → cli) #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…i); preserve CLI behavior; add tsconfig; update build/bin
…/outputs and plugging in custom adapters
…public API for adapters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the monolithic CLI into a modular pipeline architecture with clear separation of concerns: sources → transforms → outputs → core → cli. The refactor establishes a foundation for adding new input sources and output formats while maintaining full backward compatibility with existing CLI flags and behavior.
Key changes:
- Extracted core types, utilities, and CLI arg parsing into
src/core/types.ts - Created modular source adapter (
src/sources/twitter.ts) for Twitter/X archive ingestion - Separated transform logic (
src/transforms/core.ts) for filtering, grouping, and conversation mapping - Isolated output writers (
src/outputs/writers.ts) for Markdown, OAI, JSONL, ShareGPT, and stats - Introduced public library API (
src/index.ts) with pluggable adapter interfaces
Reviewed Changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/transforms/core.ts | Text cleaning, filtering, thread grouping, and conversation-to-message conversion logic |
| src/sources/twitter.ts | Twitter/X archive detection and ingestion with media handling |
| src/outputs/writers.ts | Writers for Markdown, OAI JSONL, normalized JSONL, ShareGPT, and stats formats |
| src/index.ts | Public library API exposing types, adapters, and extension interfaces |
| src/core/types.ts | Shared types, CLI argument parsing, logger, and utility functions |
| src/cli/splice.ts | CLI entrypoint orchestrating the modular pipeline |
| package.json | Updated main entry point to dist/index.js for library usage |
| README.md | Enhanced documentation with architecture overview and library usage examples |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/transforms/core.ts
Outdated
| } | ||
|
|
||
| for (const it of items) { | ||
| const role: Role = it.raw && "full_text" in (it.raw as any) ? "assistant" : "user"; |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The role assignment logic uses an implicit type check with as any and a property existence check. This is fragile and unclear. Consider extracting this to a named function like inferRole(item: ContentItem): Role with explicit documentation of the heuristic being used.
package.json
Outdated
| ".": "./dist/index.js" | ||
| }, | ||
| "bin": { | ||
| "splice": "dist/splice.js" |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bin entry points to 'dist/splice.js' but according to the PR description and README changes, the CLI should be at 'dist/cli/splice.js'. This will cause the published package's CLI command to fail.
| "splice": "dist/splice.js" | |
| "splice": "dist/cli/splice.js" |
src/cli/splice.ts
Outdated
| const validFormats = requested.filter((f) => allowedFormats.has(f)); | ||
| const invalidFormats = requested.filter((f) => !allowedFormats.has(f)); | ||
| for (const bad of invalidFormats) { | ||
| logger("warn", `Unknown format "${bad}". Supported: markdown, oai, json`); |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message lists 'markdown, oai, json' as supported formats but omits 'sharegpt' which is in the allowedFormats set (line 191). The message should include all supported formats.
src/cli/splice.ts
Outdated
| if (formatSpecified && validFormats.length === 0) { | ||
| logger( | ||
| "error", | ||
| "No valid formats requested. Supported: markdown, oai, json", |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message also omits 'sharegpt' from the list of supported formats. Should be consistent with the allowedFormats set.
What
Why
Compatibility
Tests